Í×× Óó ««ööòøøøøðð Òò Aeóòòò««ööòøøøøðð Çôøøññþþøøóò Ððóööøøñ× Óö Îöööøøóòòð Øø ×××ññððøøóò Ûûøø × Blockinóòøøòùóù× Ó×ø Ùò Blockinøøóò×

نویسندگان

  • S. Zhang
  • X. Zou
  • J. Ahlquist
  • I. M. Navon
  • J. G. Sela
چکیده

Cost fun tions formulated in four-dimensional variational data assimilation (4D-Var) are nonsmooth in the presen e of dis ontinuous physi al pro esses (i.e., the presen e of \on-o " swit hes in NWP models). The adjoint model integration produ es values of subgradients, instead of gradients, of these ost fun tions with respe t to the model's ontrol variables at dis ontinuous points. Minimization of these ost fun tions using onventional di erentiable optimization algorithms may en ounter diÆ ulties. In this paper we use an idealized dis ontinuous model and an a tual shallow onve tion parameterization, both in luding \on-o " swit hes, to illustrate the performan es of di erentiable and nondi erentiable optimization algorithms. It was found that (i) the di erentiable optimization, su h as the limited memory quasi-Newton (L-BFGS) algorithm, may still work well for minimizing a nondi erentiable ost fun tion, espe ially when the hanges made in the fore ast model at swit hing points to the model state are not too large; (ii) for a di erentiable optimization algorithm to nd the true minimum of a nonsmooth ost fun tion, introdu ing a lo al smoothing that removes dis ontinuities may ause more problems than solutions due to the insertion of arti ial stationary points; and (iii) a nondi erentiable optimization algorithm is found to be able to nd the true minima in ases where the di erentiable minimization failed. For the ase of strong smoothing, di erentiable minimization performan e is mu h improved, as ompared to the weak smoothing ases. 2 1 Introdu tion Sin e optimal ontrol theory was introdu ed into atmospheri data assimilation by Le Dimet and Talagrand (1986), four-dimensional variational data assimilation (4D-Var) has been used for atmospheri data assimilation, initially in a resear h mode, and urrently moving toward an operational implementation (Th epaut and Courtier, 1991; Navon et al., 1992; Zupanski and Mesinger, 1995; Kuo et al., 1996; Zou, 1997; Rabier et al., 1998; Zou et al., 1999). The 4D-Var approa h adjusts the ontrol variables of a numeri al weather predi tion (NWP) model to their optimal values by minimizing a spe i ed error measurement alled the ost fun tion. Information in observations is extra ted, based not only on ba kground and observational error ovarian es, but also on model dynami al and physi al onstraints. Due to its sound mathemati al basis, 4D-Var is re eiving more and more attention in the appli ations of NWP. The optimization pro edure in 4D-Var is ompleted by employing a large-s ale un onstrained minimization algorithm. Among several hoi es, the limited memory quasi-Newton algorithm (Broyden-Flet her-Goldfarb-Shanno, 1970; Liu and No edal, 1989; hereafter refereed to L-BFGS) was found to be one of the most eÆ ient when the dimensions of the ontrol variable are large, i.e., the dimensions of the ontrol variables are greater than 105 (Zou et al., 1993). The most attra tive advantage of the L-BFGS algorithm is its modest storage requirement and good onvergen e rate (Zou et al., 1993). The algorithm was originally designed for minimizing di erentiable ost fun tions. Early 4D-Var experiments used either simple models or adiabati primitive equation models without parameterized physi s. The ost fun tions de ned by these models are di erentiable sin e solutions of the assimilation model are di erentiable and the governing equations are uniformly valid over a global domain. Therefore, the L-BFGS algorithm, whi h is based on the assumption that the ost fun tion is di erentiable, has performed very well (Navon et al., 1992; Zou et al. 1993) Nonsmooth ost fun tions, i.e., ost fun tions with dis ontinuous gradients, arise in 4DVar when assimilation models in lude dis ontinuous physi al pro esses. A more omplex diabati model in luding parameterized physi s simulates the evolution of the true atmospheri state better than an adiabati one. Various physi al pro esses are in luded in the onservation equations of physi al variables su h as momentum, mass and energy. The physi al pro esses are ontrolled by lo al onditions that form a set of \IF" statements depending 3 on model variables and pres ribed threshold values. These \IF" statements are often alled \on-o " swit hes sin e they determine whether a ertain physi al pro ess should be turned on or o . The presen e of \on-o " swit hes renders the diabati model solution nondi erentiable, whi h in turn makes the ost fun tion de ned by a diabati model nondi erentiable even if the ost fun tion itself is a ontinuous fun tion (usually a quadrati form) of the model solution. Computational methods of smooth and nonsmooth optimization algorithms were developed that do not assume a spe i stru ture of the minimized fun tion, but require only the evaluation of the fun tion and its gradients (or their analogues in the nondi erentiable ase, whi h are referred to as generalized gradients or subgradients ) at any given point. For a lass of almost everywhere di erentiable fun tions (su h as pie ewise di erentiable fun tions), the subgradients are points of nondi erentiability and an be de ned by taking limits, or any linear ombinations of them with the sum of the oeÆ ients being unity. Attempts have been made to arry out 4D-Var using a diabati model to assimilate rainfall observations (Zupanski et al., 1995; Zou and Kuo, 1996; Tsuyuki, 1997) and to improve analysis in tropi s (Rabier et al., 1999). There are at least two methods that have dealt with the \on-o " swit hes in the physi al parameterization s hemes used in 4D-Var. One onsists of introdu ing a transitional fun tion to make the nonlinear model solution smooth (lo al smoothing), i.e., arrying out a lo al smoothing to remove the dis ontinuities aused by \on-o " swit hes (Zupanski, 1993; Tsuyuki, 1997). The other keeps the \on-o " swit hes in the tangent linear and the adjoint models the same as in the nonlinear model (Zou et al., 1993). As dis ussed in Zhang and Zou (1999), an integration of the adjoint model in whi h the \on-o " swit hes are kept the same as in the nonlinear model provides the gradient of the ost fun tion at a di erentiable point and a subgradient (the limit of gradients lose to the dis ontinuous point) at a nondi erentiable point. The questions that need to be answered are: (i) Does lo al smoothing eliminate problems related to \on-o " swit hes, (ii) does it ause other problems? (iii) What are the potential problems when a di erentiable optimization algorithm is employed for minimizing a nondi erentiable ost fun tion? It is worth mentioning that in pra ti e there is no sharp distin tion between nonsmooth and smooth fun tions. From the point of view of applied mathemati s and omputational 4 pra ti e, a fun tion with a rapidly hanging gradient is similar in its properties to a nonsmooth ost fun tion. Therefore, one may expe t that a di erentiable optimization algorithm may work well for minimizing a nondi erentiable ost fun tion. In this paper, a simple model ontaining a typi al form of the dis ontinuity in an NWP model is used to illustrate the performan e of the L-BFGS algorithm with and without removing dis ontinuities by smooth fun tions. A nondi erentiable optimization (Lemar e hal, 1978, 1989, 1997) is employed to nd the minimum of a ost fun tion for ases when the LBFGS fails to nd the true minimum. Then a physi al parameterization s heme (the shallow onve tion s heme in the NCEP global spe tral model) is used to examine the performan e of both the di erentiable and the nondi erentiable minimization algorithms (Se tion 3). Con lusions and dis ussions are presented in se tion 4. 2 A dis ontinuous ost fun tion using an idealized simple model with \on-o " swit hes 2.1 A nonsmooth ost fun tion de ned by a dis ontinuous model We use an idealized model with a single variable and a typi al form of dis ontinuous physi s to examine the behavior of a di erentiable minimization to solve nonsmooth problems. The numeri al fore ast model takes the form of x t = 8<: f1(x) if x < x f2(x) if x x (1) where f1(x) and f2(x) represent two parameterized physi al pro esses identi ed by a threshold value x . We an onsider a ost fun tion de ned as the quadrati form of the model predi ted state at time tR: J1(x0) = x2(tR) (2) and x t = 8<: 2x 2 as x < 1 x 4 as x 1 (3) where x0 = x(t0) and tR > t0. 5 The dis retization of the fore ast model is arried out by using a time step t = 0:1. The length of the assimilation time window is tR = t0 + 4 t. 2.2 Performan e of the L-BFGS algorithm The performan e of the limited-memory quasi-Newton (L-BFGS) method of Liu and No edal (1989) is examined with di erent initial guess values for x0. Figure 1 shows that the minimization using the L-BFGS method onverged with the initial guess of a) x(0) 0 = 2:00, b) x(0) 0 = 2:29, ) x(0) 0 = 2:80 or d) x(0) 0 = 2:90. However, the minimization failed to nd the true minimum x 0 = 0:5 if it starts with an initial guess of x(0) 0 = 2:55 (Fig. 2). After about ve iterations, the minimization was trapped in a lo al minimum generated by the \on-o " swit hes, ausing the iterative pro edure to swit h ba k and forth between x(0) 0 = 1:521(J = 0:41) and x(0) 0 = 1:518(J = 0:99). In order to gain some insight into the general performan e of the L-BFGS method, we sele ted 300 points in the interval [0; 3℄ for x(0) 0 , in remented by 0.01 for the su essive initial guess points. Numeri al results are presented in the se ond row of Table 1. The L-BFGS minimization failed to nd the true minimum in 18 ases. In the other 282 ases, it onverged to the true solution. We found that the total number of minimization failures depends strongly on the size of the jump in the fore ast model, and thus the size of the jump in the ost fun tion. When f2(x) = x 4 is repla ed with f2(x) = x 3:5 (i.e., the size of the jump is redu ed), the total number of failing ases was redu ed from 18 to 13. If the size of the jump is further redu ed, say f2(x) = x 3, the L-BFGS algorithm su essfully nds the solution (x0 = 0:5) for the entire 300 ases tested while in reasing the jump size generally leads to more failing ases. This example suggests that the di erentiable minimization may still work well for solving nondi erentiable 4D-Var problems, espe ially when the sizes of the jumps aused by dis ontinuous physi s are small. However, it is possible that the di erentiable minimization may remain stu k with a lo al minimum and fail to nd the true minimum. The possibility of en ountering a minimization failure in the presen e of a dis ontinuity is greater when the fore ast model ontains larger jumps. 6 2.3 Introdu ing a lo al smooth fun tion to remove dis ontinuities Given the fa t that the presen e of dis ontinuities asso iated with \on-o " swit hes in the 4D-Var assimilation model may ause diÆ ulties for the minimization in nding the true minimum, a natural step to remedy this situation is to introdu e a smooth transitional fun tion to remove the dis ontinuities (Zupanski et al., 1995; Tsuyuki, 1997). For example, a fun tion fsmooth = 0:5f1 + tanh[ (x x )℄g (4) is introdu ed into the assimilation model at threshold points, where is a s alar whi h ontrols the a ura y of the smoother. The transition is implemented by al ulating ftransition asftransition(x) = ( f1(0:9)+f2(1:1) 2 + [f1(0:9)+f2(1:1) 2 f1(x)℄(2fsmooth 1) ifx h x < x f1(0:9)+f2(1:1) 2 [f1(0:9)+f2(1:1) 2 f2(x)℄(2fsmooth 1) if 1 x < x + h (5) and de ne the fore ast model after smoothing as x t = f(x) f(x) = 8>>><>>>: f1(x) ifx < x h ftransition(x) ifx h x < x + h f2(x) ifx x + h (6) where h is a small positive s alar. Choosing h = 0:1 and = 100 (a weak smoothing), the introdu tion of the transitional smooth fun tion produ es a smooth distribution of the sour e term f(x) (thi k dotted line in Fig. 4), and a ontinuous distribution of the ost fun tion in the ontrol variable spa e (dotted line in Fig. 5a). However, the introdu tion of the smooth fun tion into the fore ast model hanges the distribution of the gradient of J with respe t to the ontrol parameter (Fig. 5b), introdu ing additional stationary points. The ost fun tion without smoothing has a unique stationary point (zero-gradient) whi h orresponds to the true solution (x0 = 0:5) (solid line in Fig. 5b), while the one with smoothing has two extra stationary points for ea h jump (dotted line in Fig. 5b). Therefore, although the gradient may be dis ontinuous when the \on-o " swit hes in the adjoint model are kept the same as in the nonlinear model, su h a dis ontinuity in the gradient does not hange the general onvexity feature, and the adjoint model integration provides useful subgradient information (see se tion 2.4 and Zhang et al., 2000). However, a smooth fun tion whi h seems to remove 7 the dis ontinuity may introdu e false stationary points, whi h may render the minimization to onverge to a wrong solution (Zou et al., 1993). We indi ate, however, that when a very strong smoothing is applied (e.g., the value of is redu ed and that of h is in reased), the zig-zag behavior of the ost fun tion an be eliminated and the arti ial stationary points will not exist. Figure 6 shows the distributions of the smoothed f(x) if h = 0:2, = 20 (thi k dashed line) and h = 0:5, = 5 (thi k dotteddashed line). The resulting ost fun tion and gradient distributions are displayed in Fig. 7a and Fig. 7b (thi k dashed line for h = 0:2, = 20 and thi k dotted-dashed line for h = 0:5, = 5), respe tively. It is found that with a strong smoothing, the zig-zag behavior of the ost fun tion is eliminated and the minimization of the smoothed ost fun tion using a di erentiable optimization algorithm performs well. But the impa t of su h a strong smoothing on model solutions need to be investigated before it is applied in 4D-Var. In order to test the performan e of the L-BFGS method for the ost fun tion in whi h the dis ontinuity is removed by a transitional smooth fun tion, we repeated the experiments arried out in se tion 2.2 without the lo al smoothing fun tion introdu ed at the swit h point. Figure 8 displays the performan e of the L-BFGS algorithm on the smoothed ost fun tion with the same length of time integration and initial guess points as in Fig. 1. For all four ases examined, the minimization algorithm stu k at a lo al stationary point, instead of the true minimum. When all 300 ases are repeated with a weak lo al smoothing, we nd that there are ases for whi h the minimization with smoothing su eeds and the one without smoothing fails. The introdu tion of a weak smoothing in reases the total number of failed ases, whi h is not observed in the ase of strong smoothing. Table 1 summarizes the performan es of the L-BFGS algorithm without smoothing, with a weak smoothing (h = 0:1 and = 100), and with a strong smoothing (h = 0:5 and = 5). Compared with the results without smoothing, the introdu tion of a weak smooth fun tion in reased the total number of failed ases by a fa tor of two. For f2(x) = x 3:0, or x 2, or x 1:5, the introdu tion of a weak smooth fun tion produ es 18 failed ases for the L-BFGS method while the L-BFGS minimization had no problem nding the true minimum without smoothing. These results indi ate that the lo al smoothing for dis ontinuous physi s may sometimes do more damage than good in 4D-Var with dis ontinuous physi s. 8 Table 1: The performan e of the L-BFGS method Jump size ratio between the failed and su essful ases f2(x) in the model no smoothing weak smoothing strong smooth x-4.5 3.5 37/300 54/300 0/300 x-4.4 3.4 24/300 54/300 0/300 x-4.3 3.3 28/300 54/300 0/300 x-4.2 3.2 19/300 44/300 0/300 x-4.1 3.1 15/300 52/300 0/300 x-4.0 3.0 18/300 37/300 0/300 x-3.5 2.5 13/300 20/300 0/300 x-3.0 2.0 0/300 18/300 0/300 x-2.0 1.0 0/300 18/300 0/300 x-1.5 0.5 0/300 19/300 0/300 2.4 Performan e of the bundle method for nonsmooth ost fun tions Most di erentiable minimization algorithms depend on two basi assumptions: (i) the negative gradient at a given point is the steepest dire tion and is used to approximate the sear h dire tion; and (ii) the ost fun tion a hieves a monotone and signi ant de rease along the sear h dire tion at ea h iteration. For a onvex nonsmooth fun tion, the dire tion negative to that of a subgradient is not always a dire tion of des ent. The di erentiable un onstrained minimization algorithms, su h as L-BFGS, an fail in minimizing a nonsmooth ost fun tion, as was shown in se tion 2 for the L-BFGS method. Nondi erential minimization algorithms, therefore, onsider only the basi fa ts of onvex analysis for onvex fun tions, in luding nondi erential ones. There are various types of nonsmooth optimization algorithms. The basi mathemati al onsiderations related to implementation of nonsmooth optimization are brie y des ribed below. 2.4.1 Bundle methods in nonsmooth optimization A nondi erentiable minimization algorithm uses subgradients, alled generalized gradients instead of gradients, to attempt to for e the fun tion to de rease along the subgradient dire tion. A ve tor J(x) 2 ; 8y 2 0 is some weighting parameter and v is a onstant term (s alar) in the quadrati form, and J(x(x);y(j)) = J(x(k)) J(y(j)) ( (j) J )T (x(k) y(j)) denotes the linearization error at x(k). The next bundle iteration point is obtained via the following line-sear h strategy: Let y(k+1) = x(k) + (k)d(k) for some (k) > 0. Let (k+1) J 2 J(y(k+1)) Then if J(y(k+1)) J(x(k)) Æ(k) for some Æ(k) > 0, (i) make a serious step x(k+1) = y(k+1), (ii) otherwise, make a null-step x(k+1) = x(k). In both ases add (k+1) J to the existing bundle. 2.4.2 Numeri al results We hoose to use the bundle method of Lemar e hal (1977, 1978) and ompare its performan e with that of the L-BFGS method. The subgradients (or subdi erentials), instead of gradients, are used in this method at nondi erentiable points. From (8), we know that this method using subgradient takes smoothing on gradient rather than the fun tion itself in order to seek a de rease dire tion. The advantage of smoothing the gradient is that the original problem is not hanged. But, we will see that more lo al gradients need to be evaluated in order to implement the smoothing. Therefore, the bundle method is mu h more expensive than the L-BFGS. We arbitrarily hoose four initial guess points from whi h the L-BFGS, both with and without lo al smoothing, failed to onverge to the true solution. Figure 9 shows the performan e of the bundle method for these ases hosen. It is en ouraging to nd that the 11 bundle method onverged to the true solution in all four ases. We then pro eeded with more experiments. We repeated all the ases (a total of 49) for whi h either the L-BFGS without smoothing or the L-BFGS with smoothing failed. Numeri al results are presented in Fig. 10. We observe that there are 18 ases where the L-BFGS method failed to nd the true minimum, and 37 ases where the L-BFGS method with smoothing failed. There are six ases where both the L-BFGS without smoothing and the L-BFGS with smoothing failed. The bundle method su eeded in all 49 ases for whi h the L-BFGS with or without smoothing failed. We mention, however, that there are 12 ases (out of 300 points) where the bundle method failed to nd the true minimum and where the minimization using the L-BFGS method with and without smoothing was su essful. This indi ates that the performan e of the smooth and nonsmooth optimization algorithms is ase-dependent. However, we have not found a ase where both the L-BFGS and the bundle method failed. 3 A dis ontinuous ost fun tion using a shallowonve tion observation operator 3.1 Shallowonve tion Shallow onve tion is sometimes alled dry onve tion. In the 1960s, it was mainly used to allow onve tive adjustment in response to radiative heating so that a thermal equilibrium pro le loser to the observation ould be obtained (Manabe et al., 1964). In the 1980s, it began to be widely used in numeri al modeling as a ne essary ompensation when deep onve tion does not o ur (Betts et al., 1986). Shallow onve tion is a parameterization s heme whi h produ es a verti al thermodynami al adjustment in the atmosphere. Unlike deep onve tion, whi h has a larger verti al s ale due to water vapor onvergen e and ondensational heating, shallow onve tion only deals with the verti al di usion of unstable energy. It does not involve pre ipitation and ondensation, and its verti al s ale is rather small. The di usion equations used in the shallow onve tion of the NCEP spe tral model are 12 given by T t = 1 z KQT " T z + #! (14) q t = 1 z KQT q z! (15) where T is temperature, q is spe i humidity, KQT is the di usion oeÆ ient for the temperature and humidity, and is the dry adiabati lapse rate. The dis retized version of these di usion equations forms a tridiagonal system whi h is solved using Gaussian elimination. In the physi al pa kage of the NCEP global spe tral model, all the unstable olumns in whi h umulus does not o ur are pi ked up immediately following the Arakawa-S hubert umulus s heme. We an summarize it in three steps: 1. Cal ulate the moist stati energy (gZ + CpT + Lq) and identify the olumns with the onditional instability ( Z (gZ + CpT + Lq) < 0). Cal ulate the lifting ondensation level as the loud base and the highest instability level as the loud top. Be ause of verti al dis retization, the loud base/top does not hange ontinuously when the moist stati energy pro le is perturbed by the hange of the thermodynami al ondition. 2. Assign KQT = 1:5 m2/s at the base; KQT = 1:0m2/s, at the top; KQT = 3:0m2/s, for the next-to-top layers; KQT = 5:0m2/s, for any other layers, preventing development of unrealisti kinks in the T and q pro les. The assignment of the di erent values of di usion oeÆ ient for the di erent loud layer may ause dis ontinuities of the solution of the shallowonve tion adjustment. 3. Gaussian elimination is then employed to solve the resulting tridiagonal system and the adjusted temperature and spe i humidity pro les are obtained for the identi ed unstable olumns. The rst two steps of the omputational implementation of the shallowonve tion parameterization may introdu e dis ontinuities into the distribution of a ost fun tion, de ned by the shallowonve tion, with respe t to the model temperature and spe i humidity pro les. Figure 11 shows, for example, the distribution of the temperature adjustments at the three model levels resulting from the shallow onve tion with various input values of the 13 temperature at the model level 3. The shallowonve tion pro ess is turned on and o by hanging only the temperature at level 3 from 258:5K to 262:5K at an interval of 0:01K. The adjustment o urred at three levels: levels 2, 3, and 4. It is obvious that the solution of the shallowonve tion is dis ontinuous with respe t to the input variable. 3.2 Experiment design and ost fun tion In order to test the performan es of the L-BFGS and the nonsmooth optimization bundle algorithms using a realisti physi al parameterization, the shallowonve tion operator is used to de ne a twin experiment. The ost fun tion is de ned to measure the distan e between the output of the shallowonve tion operator (adjusted temperature and spe i humidity pro les) and \observations" (the output temperature and spe i humidity pro les for a sele ted input pro les of temperature and spe i humidity, i.e., the true solution): J2(T; q) = HT (T; q) T obs T WT HT (T; q) T obs + Hq(T; q) qobs T Wq Hq(T; q) qobs (16) where WT and Wq are onstant diagonal weighting matri es and their values are set to be 10 5 and 106 empiri ally, orresponding to typi al orders of simulated temperature and spe i humidity errors of 1oC and 1 g kg 1. The variables T and q in (16) are ve tors of dimension K0, whi h is equal to 10, the highest model level whi h may be a e ted by the shallowonve tion pro ess. The gridded-data of temperature (T ), spe i humidity (q) and the surfa e pressure (ps) at 00 UTC on 1 July 1995 from the NCEP re-analysis data are used for testing the performan es of the L-BFGS method and the nonsmooth bundle method for minimizing J de ned in (16). For a resolution of T62L28, there are a total of 47 384 olumns over the entire global domain. We hoose 384 olumns orresponding to all the Gaussian grids around the latitude 12o N for the test. 3.3 Performan es of the L-BFGS and bundle algorithms Among the 384 olumns, there are 51 olumns in whi h the shallowonve tion pro ess is turned on. Within these 51 olumns, we sele ted olumn 111 (near 150 W ) as the truth, 14 forming the \observations" in (16). Starting from the analysis pro les at the other 383 olumns, we applied both the L-BFGS and bundle algorithms to approximate the \true" atmospheri pro les of temperature and spe i humidity. The L-BFGS algorithm su eeded in all 380 ases ex ept for the three olumns: 122 (at 132 W), 135 (at 107 W) and 145 (at 88 W). The bundle algorithm onverged to the true solution for all 383 ases. Tables 2-4 show the results of the minimization starting from the temperature and spe i humidity pro les at the 122th, 135th, and 145th olumns, respe tively. Figure 12 shows the variations of the normalized ost fun tion and the norm of the gradient (or subgradient) for the minimization starting from the temperature and spe i humidity pro les at the 135th olumn. With six iterations the L-BFGS algorithm de reased the ost fun tions by about two orders of magnitude. The bundle algorithm de reased the ost fun tions by 3-5 orders of magnitude. The temperature and spe i humidity pro les retrieved by the bundle method is mu h more a urate (more than an order of magnitude) than those retrieved by the L-BFGS method. The bundle method uses more fun tion alls at ea h iteration than the L-BFGS method. The omputational expenses of the bundle method is about twi e of that of the L-BFGS method. Table 2: Statisti s on minimization of the L-BFGS and bundle methods for the example of shallowonve tion with 122 olumn as initial guess fun tion alls RMS Errors J iteration T(o) q(g/kg) L-BFGS bundle L-BFGS bundle L-BFGS bundle L-BFGS bundle 0 2.456 2.456 0.415 0.415 8.613 8.613 1 1 2 1.794 0.387 0.315 0.012 4.919 0.014 2 1 5 0.711 0.211 0.091 0.009 0.112 0.009 3 3 1 0.709 0.172 0.091 0.008 0.112 0.003 4 1 2 0.677 0.083 0.088 0.006 0.100 0.001 5 2 5 0.677 0.030 0.088 0.002 0.100 1:5 10 4 6 2 3 0.677 0.023 0.088 0.002 0.100 0:8 10 4 Figures 13-16 present the temperature and spe i humidity pro les obtained at ea h iteration using the L-BFGS method (Figs. 13 and 15) and the bundle method (Figs. 14 and 16). We nd that the adjustment made by shallow onve tion to the input pro les of T and 15 Table 3: Statisti s on minimization of the L-BFGS and bundle methods for the example of shallowonve tion with 135 olumn as initial guess fun tion alls RMS Errors J iteration T(o) q(g/kg) L-BFGS bundle L-BFGS bundle L-BFGS bundle L-BFGS bundle 0 2.245 2.245 0.366 0.366 6.787 6.787 1 1 2 1.636 0.015 0.266 0.011 3.570 0.003 2 1 5 0.365 0.033 0.063 0.005 0.046 0:8 10 3 3 4 11 0.365 0.022 0.063 0.004 0.046 0:2 10 3 4 1 2 0.365 0.013 0.063 0.003 0.046 0:6 10 4 5 1 2 0.365 0.009 0.063 0.002 0.046 0:2 10 4 6 1 2 0.365 0.005 0.063 0:6 10 3 0.046 0:4 10 5 q (solid line with stars) for simulated \observations" o urred at three model levels, 3, 4 and 5, and were rather small (not shown). After the 2nd iteration, the L-BFGS minimization was stu k in a lo al minimum with swit hing turned on and o on several model levels, and had diÆ ulty getting free and approa hing the true minimum. Su h a phenomenon was not seen with the bundle algorithm. Within four iterations, the minimization results approximated the true solution losely. By six iterations, di eren es between the minimization of the retrieved and the \observed" pro les of temperature and spe i humidity were negligible. This due to the bundle algorithm property: in the rst few iterations the algorithm olle ts information about the ost fun tion by bundle the subgradients. After 4-5 iterations, the subgradient bundle approximates the whole generalized gradient well and the \optimal" des ent dire tion is generated. In order to ensure a suÆ ient de rease in the value of the ost fun tion, the bundle method evaluates many sub-gradients at ea h iteration. For example, 10 and 19 fun tion alls are required by L-BFGS and the bundle method to omplete the six iterations, respe tively. Therefore, this algorithm is mu h more expensive than the L-BFGS algorithm. 4 Summary and on lusions The ost fun tion in 4D-Var using a diabati assimilation model with parameterized physi s is only pie ewise di erentiable (Zhang and Zou, 1999). The di erentiable minimization al16 Table 4: Statisti s on minimization of the L-BFGS and bundle methods for the example of shallowonve tion with 145 olumn as initial guess fun tion alls RMS Errors J iteration T(o) q(g/kg) L-BFGS bundle L-BFGS bundle L-BFGS bundle L-BFGS bundle 0 2.162 2.162 0.315 0.315 4.950 4.950 1 1 2 1.402 0.317 0.215 0.010 2.268 0.012 2 1 5 0.434 0.186 0.058 0.010 0.031 0.004 3 2 3 0.413 0.139 0.057 0.007 0.029 0.001 4 2 2 0.407 0.103 0.056 0.006 0.028 0:9 10 3 5 1 2 0.301 0.088 0.051 0.006 0.021 0:8 10 3 6 3 4 0.292 0.077 0.050 0.005 0.019 0:6 10 3 gorithms su h as the limited memory quasi-Newton method that are originally designed for minimizing di erentiable fun tions an fail. Using both a simple example and a shallowonve tion s heme, we found: (i) The L-BFGS algorithm may still work well for minimizing nonsmooth ost fun tions. If the jumps in the ost fun tion aused by dis ontinuous physi s ( ontrolled by \on-o " swit hes) are large, the minimization may onverge to a lo al minimum introdu ed by the \on-o " swit hes. (ii) Introdu ing a weak smooth fun tion to remove the dis ontinuities asso iated with \on-o " swit hes may do more damage than help to data assimilation results. The smoothing ould introdu e arti ial stationary points, whi h may ause a minimization to onverge to a wrong solution. (iii) The nondi erentiable bundle method performs well for minimizing nonsmooth ost fun tions, although it is omputationally twi e as expensive as the L-BFGS method. Although a strong smoothing does not ause problem for the di erentiable minimization to onverge and may be applied, the onsequen e of su h a strong smoothing on hanging the predi tion of atmospheri state needs to be studied beforehand. We have used simple models for examining the performan e of di erentiable and nondi erentiable optimization algorithms for dis ontinuous ost fun tions. We plan to test the feasibility of applying the nondi erentiable minimization algorithm, su h as the bundle method (Lemar e hal 1989), to 4D-Var using the NCEP adiabati model and omparing the a ura y of the solution with that using the di erentiable minimization methods. 17 A knowledgements This resear h is supported by NOAA grant NA37WA0361 and NSF grant ATM-9812729. The authors would like to thank Dr. E. Kalnay for her persistent en ouragement on this study. We also thank Dr. Claude Lemar e hal from I.N.R.I.A. for providing to us the software for his bundle method. The original bundle algorithm was modi ed by Profs. Navon and Nazareth to t the test problems. 18 REFERENCES Arakawa, A. and W. H. S hubert, 1974: Intera tion of a umulus ensemble with the larges ale environment, Part I. J. Atmos. S i., 31, 674{701. Betts, A. K., 1986: A new onve tive adjustment s heme. Part I: Observational and theoreti al basis. Quart. J. Roy. Meteor. So ., 112, 677{691. Broyden, C. G., 1970: The onvergen e of a lass of double rank minimization algorithms. Parts I and II. J. Inst. Maths. Appli s.6, 76{90. Flet her, R., 1970: A new approa h to variable metri algorithms. Computer Journal,13(3), 312{322. Ghil, M., S. Cohn, J. Tavantzis, K. Bube and E. Isaa son, 1981: Appli ations of estimation theory to numeri al weather predi tion, in Dynami al meteorology. Data assimilation methods. New York, 139{224. Goldfarb, D., 1970: A family of variable metri methods derived by variational means. Math. of Comp.24, 23{26. Hiriart-Urruty, J-B and C. Lemar e hal, 1993: Convex analysis and minimization algorithms II: Advan ed theory and bundle methods. Springer-Verlag, Vol. 306, 346pp Hong, S. Y. and H. L. Pan, 1996: Nonlo al boundary-layer verti al di usion in a mediumrange model. Mon. Wea. Rev., 124, 2322{2339. Kanamitsu M., J.C. Alpert, K.A. Campana, P.M. Caplan, D.G. Deaven, M. Iredell, B. Katz, H.-L. Pan, J. Sela and G.H. White, 1991: Re ent hanges implemented into global fore ast system at NMC. Weather and Fore asting 6, 425{435. Kiwiel, K.C., 1985: Methods of des ent for nondi erentiable optimization. Springer-Verlag, Le ture notes in mathemati s, Vol. 1133, 362pp Kuo, Y.-H., X. Zou and Y.-R. Guo, 1996: Variational assimilation of pre ipitable water using a nonhydrostati mesos ale adjoint model. Part I: Moisture retrieval and sensitivity experiments. Mon. Wea. Rev., 124, 122{147. Le Dimet, F. X. and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of Meteorologi al observations: Theoreti al aspe ts. Tellus, 38A, 97{110. 19 Leith, C.E., 1971: Atmospheri predi tability and two-dimensional turbulen e. J. Atmos. S i., 28, 145{161. Lemar e hal, C., 1977: Bundle methods in nonsmooth optimization. Pro eeding IIASA series. by C. Lemar e hal and R. Mi in, Eds, Pergamon press, 186pp, 79{103 Lemar e hal, C., 1978: Nonsmooth optimization and des ent methods. International Institute for Appl. Sys. Analysis, Laxenburg, Austria, 25pp Lemar e hal, C., 1989: Nondi erentiable optimization. Optimization, Handbooks in OR&MS;, 1, G. L. Nemhauser et al. (eds.), Elsevier S ien e Publishers, 529{572. Lemar e hal, C. and C. Sagastizabal 1997: Variable metri bundle methods: From on eptual to implementable forms. Mathemati al Programming, 76, 393{410. Liu, D. C., and J. No edal, 1989: On the limited memory BFGS method for large s ale optimization. Mathemati al Programming , 45, 503-528. Manabe, S. and R. F. Stri kler, 1964: Thermal equilibrium of the atmosphere with a onve tion adjustment. J. Atmos. S i., 21, 361{385. Navon, I. M., X. Zou, J. Derber, and J. Sela, 1992: Variational data assimilation with an adiabati version of the NMC spe tral model. Mon. Wea. Rev., 120, 1433-1446. No edal J,1980: Updating Quasi-Newton Matri es with Limited Storage, Mathemati s of Computation35, 773{782. Pan, H. L. and W. S. Wu, 1995: Implementing a mass ux onve tion parameterization pa kage for the NMC medium-range fore ast model. Te hi al Report for NMC/NOAA/NWS, No. 409. Pierrehumbert, R. T., 1986: An essay on the parameterization of orographi gravity wave drag. GFDI/NOAA, Prin eton University, Prin eton, NJ08542. Poljak, B. T., 1977: Subgradient methods: A survey of Soviet resear h in nonsmooth optimization. C. Lemar e hal and R. Mi Liu, Eds, Pergamonn press, 5-31,186pp Rabier F., J.-N. Thepaut and P. Courtier, 1998: Extended assimilation and fore ast experiments with a four-dimensional variational assimilation system. Quart. J. Roy. Meteor. So ., 124, 1861{1887. 20 Sela, J., 1982: The NMC spe tral model. NOAA te hni al report, NWS 30, NWS/NOAA, US Department of Commer e, 36pp. Sela, J., 1987: The new T80 NMC operational spe tral model. Preprint, the Eighth onferen e on Numeri al Weather Predi tion, Baltimore, Maryland, February 22-26,1988. Shanno, D. F., 1970: Conditioning of quasi-Newton methods for fun tion minimization. Math. of Comp.24, 647{657. Shor, N. Z., 1985: Minimization methods for nondi erentiable fun tions. Springer-Verlag (translated from Russian, 1979 by K.C. Kiwiel and A. Rusz zynski), 162pp Thepaut, J. N. and P. Courtier, 1991: 4-dimensional data assimilation using the adjoint of a multilevel primitive equation model. Quart. J. Roy. Meteor. So ., 117, 1225{1254. Tsuyuki, T., 1997: Variational data assimilation in the tropi s using pre ipitation data. Part III: Assimilation of SSM/I pre ipitation rates. Mon. Wea. Rev., 125, 1447{1464. Xiao Q., and X. Zou, and Y.-H. Kuo, 1999: In orporating the SSM/I derived pre ipitable water vapor and rain rate into a numeri al model: A Case study for ERICA IOP-4 y lone. Mon. Wea. Rev., (in press). Zhang, S. and X. Zou, 1999: Further dis ussions on the use of adjoint of dis ontinuous model physi s in 4D-Var. Submitted to Tellus. Zou, X., 1997: Tangent linear and adjoint of \on-o " pro esses and their feasibility for use in 4-dimensional variational data assimilation. Tellus, 49A, 3{31. Zou, X., I. M. Navon, M. Berger, Paul K. H. Phua, T. S hli k, and F. X. LeDimet, 1993: Numeri al experien e with limited-memory quasi-Newton and trun ated-Newton methods. SIAM Journal on Optimization, 3, 582-608. Zou, X., and Y.-H. Kuo, 1996: Rainfall assimilation through an optimal ontrol of initial and boundary onditions in a limited-area mesos ale model. Mon. Wea. Rev., 124, 2859-2882. Zou, X., H. Liu, J. Derber, J. G. Sela, R. Treaton, and B. Wang, 1999: Four-dimensional variational data assimilation with a full-physi s version of the NCEP spe tral model: System development and preliminary results. (submitted to Quart. J. Roy. Meteor. So .) 21 Zupanski, D., 1993: The e e ts of dis ontinuities in the Betts-Miller umulus onve tion s heme on four-dimensional variational data assimilation in a quasi-operational fore asting environment. Tellus, 45A, 511{524. Zupanski, D. and F. Mesinger, 1995: Four-dimensional variational assimilation of pre ipitation data. Mon. Wea. Rev., 123, 1112{1127. 22 FIGURE CAPTIONS Fig. 1 The performan es of the L-BFGS algorithm for minimizing the ost fun tion J1 dened in (2) through (3), a pie ewise di erentiable ost fun tion, starting from di erent initial guess points: a) x0 = 2, b) x0 = 2:29, ) x0 = 2:80 and ) x0 = 2:90. The time window in ludes 4-step time steps, i.e., tR = t0 + 4 t, t = 0:1. The ost fun tion (solid urve) is obtained by evaluating the fun tion at the di erent initial onditions with an interval of 0.01. The numbers in the gures are the iteration numbers and the bla k dots represent the value of J obtained at ea h iteration. Fig. 2 Same as Fig. 1 ex ept for the minimization starting from the initial guess point x0 = 2:55. Fig. 3 The Distributions of the ost fun tion J1 de ned in (2) through (1) with f1 = 2x 2 and f2(x) = x 4 (thin solid line), f2(x) = x 3:5 (thi k dotted line) and f2(x) = x 3 (thi k solid line). Fig. 4 Distributions of the model for ing f(x) (see 6) with (dotted line) and without (solid line) smoothing when h=0.1 and = 100. Fig. 5 Distributions of (a) the ost fun tion and (b) the gradient with (dotted lines for h=0.1 and = 100) and without (dotted lines) smoothing. Fig. 6 Same as Fig. 4 ex ept for h = 0:2, alpha = 20 (thi k dashed line) and h = 0:5, alpha = 5 (thi k dotted-dashed line). Fig. 7 Same as Fig. 5 ex ept for h = 0:2, alpha = 20 (thi k dashed line) and h = 0:5, alpha = 5 (thi k dotted-dashed line). Fig. 8 Same as Fig. 1 ex ept that the lo al smoothing (see 6) is introdu ed at the swit hing point. A \2..." indi ates the minimization pro ess is stu k around the same point after 2 iterations. Fig. 9 Same as Fig. 1 ex ept that the nondi erentiable bundle algorithm is used and the initial guess values of (a) x0 = 2:54, b) x0 = 2:57, ) x0 = 2:92 and d) x0 = 2:95 are used. The L-BFGS with and without smoothing failed in all these four ases. 23 Fig. 10 The solutions obtained by the L-BFGS method with (diamonds onne ted by dotted line) and without (dots onne ted by thin solid line) and the bundle method (thi k solid line) for the 49 ases for whi h the L-BFGS method with or without smoothing failed. Fig. 11 Distributions of the temperature adjustments at model level 2 (solid line), 3 (dotted line) and 4 (dashed line) due to shallow onve tion for various values of the input temperature at the model level 3. The temperature interval for the input temperature at the model level 3 at whi h these al ulations are made is 0.01oC. Fig. 12 The logarithmi variations of the normalized values of the ost fun tion (solid lines) and the gradient (dashed lines) with the number of iterations using the L-BFGS (thin solid and dotted lines) and the bundle (thi k solid and dotted lines). The ost fun tion is de ned by (16) using shallow onve tion. Fig. 13 The temperature pro les before and after the shallowonve tion adjustments at (a) the 0th and the rst iteration, (b) the se ond iteration, ( ) the third iteration, and (d) the fourth iteration using the L-BFGS method. The true temperature pro les, i.e., the pro le whi h was used to generate \observations" using shallow onve tion, is presented in ea h panel (solid lines with stars). Fig. 14 Same as Fig. 11 ex ept using the nondi erentiable bundle method. Fig. 15 Same as Fig. 11 ex ept for the spe i humidity variable. Fig. 16 Same as Fig. 12 ex ept for the spe i humidity variable. 24 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 Initial condition x0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 J 1 ( x 0 ) -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 J 1 ( x 0 ) -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Initial condition x0 0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 3 4 a) b)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000